Olympia
Unsupervised decoding of encoded reasoning using language model interpretability
As large language models become increasingly capable, there is growing concern that they may develop reasoning processes that are encoded or hidden from human oversight. To investigate whether current interpretability techniques can penetrate such encoded reasoning, we construct a controlled testbed by fine-tuning a reasoning model (DeepSeek-R1-Distill-Llama-70B) to perform chain-of-thought reasoning in ROT-13 encryption while maintaining intelligible English outputs. We evaluate mechanistic interpretability methods--in particular, logit lens analysis--on their ability to decode the model's hidden reasoning process using only internal activations. We show that logit lens can effectively translate encoded reasoning, with accuracy peaking in intermediate-to-late layers. Finally, we develop a fully unsupervised decoding pipeline that combines logit lens with automated paraphrasing, achieving substantial accuracy in reconstructing complete reasoning transcripts from internal model representations. These findings suggest that current mechanistic interpretability techniques may be more robust to simple forms of encoded reasoning than previously understood. Our work provides an initial framework for evaluating interpretability methods against models that reason in non-human-readable formats, contributing to the broader challenge of maintaining oversight over increasingly capable AI systems.
- North America > United States > Illinois > Sangamon County > Springfield (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.07)
- North America > United States > California > Sacramento County > Sacramento (0.05)
- (22 more...)
Predicting Delayed Trajectories Using Network Features: A Study on the Dutch Railway Network
Kampere, Merel, Alsahag, Ali Mohammed Mansoor
The Dutch railway network is one of the busiest in the world, with delays being a prominent concern for the principal passenger railway operator NS. This research addresses a gap in delay prediction studies within the Dutch railway network by employing an XGBoost Classifier with a focus on topological features. Current research predominantly emphasizes short-term predictions and neglects the broader network-wide patterns essential for mitigating ripple effects. This research implements and improves an existing methodology, originally designed to forecast the evolution of the fast-changing US air network, to predict delays in the Dutch Railways. By integrating Node Centrality Measures and comparing multiple classifiers like RandomForest, DecisionTree, GradientBoosting, AdaBoost, and LogisticRegression, the goal is to predict delayed trajectories. However, the results reveal limited performance, especially in non-simultaneous testing scenarios, suggesting the necessity for more context-specific adaptations. Regardless, this research contributes to the understanding of transportation network evaluation and proposes future directions for developing more robust predictive models for delays.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- Europe > Netherlands > South Holland > Leiden (0.04)
- (35 more...)
WavePulse: Real-time Content Analytics of Radio Livestreams
Mittal, Govind, Gupta, Sarthak, Wagle, Shruti, Chopra, Chirag, DeMattee, Anthony J, Memon, Nasir, Ahamad, Mustaque, Hegde, Chinmay
Radio remains a pervasive medium for mass information dissemination, with AM/FM stations reaching more Americans than either smartphone-based social networking or live television. Increasingly, radio broadcasts are also streamed online and accessed over the Internet. We present WavePulse, a framework that records, documents, and analyzes radio content in real-time. While our framework is generally applicable, we showcase the efficacy of WavePulse in a collaborative project with a team of political scientists focusing on the 2024 Presidential Elections. We use WavePulse to monitor livestreams of 396 news radio stations over a period of three months, processing close to 500,000 hours of audio streams. These streams were converted into time-stamped, diarized transcripts and analyzed to track answer key political science questions at both the national and state levels. Our analysis revealed how local issues interacted with national trends, providing insights into information flow. Our results demonstrate WavePulse's efficacy in capturing and analyzing content from radio livestreams sourced from the Web. Code and dataset can be accessed at \url{https://wave-pulse.io}.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New York > Kings County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (215 more...)
- Media > Radio (1.00)
- Leisure & Entertainment (1.00)
- Government > Voting & Elections (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
VILENS: Visual, Inertial, Lidar, and Leg Odometry for All-Terrain Legged Robots
Wisth, David, Camurri, Marco, Fallon, Maurice
We present visual inertial lidar legged navigation system (VILENS), an odometry system for legged robots based on factor graphs. The key novelty is the tight fusion of four different sensor modalities to achieve reliable operation when the individual sensors would otherwise produce degenerate estimation. To minimize leg odometry drift, we extend the robot's state with a linear velocity bias term, which is estimated online. This bias is observable because of the tight fusion of this preintegrated velocity factor with vision, lidar, and inertial measurement unit (IMU) factors. Extensive experimental validation on different ANYmal quadruped robots is presented, for a total duration of 2 h and 1.8 km traveled. The experiments involved dynamic locomotion over loose rocks, slopes, and mud, which caused challenges such as slippage and terrain deformation. Perceptual challenges included dark and dusty underground caverns, and open and feature-deprived areas. We show an average improvement of 62% translational and 51% rotational errors compared to a state-of-the-art loosely coupled approach. To demonstrate its robustness, VILENS was also integrated with a perceptive controller and a local path planner.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > United Kingdom > England > Wiltshire (0.04)
- (7 more...)
Learning to Correct for QA Reasoning with Black-box LLMs
Kim, Jaehyung, Kim, Dongyoung, Yang, Yiming
An open challenge in recent machine learning is about how to improve the reasoning capability of large language models (LLMs) in a black-box setting, i.e., without access to detailed information such as output token probabilities. Existing approaches either rely on accessibility (which is often unrealistic) or involve significantly increased train- and inference-time costs. This paper addresses those limitations or shortcomings by proposing a novel approach, namely CoBB (Correct for improving QA reasoning of Black-Box LLMs). It uses a trained adaptation model to perform a seq2seq mapping from the often-imperfect reasonings of the original black-box LLM to the correct or improved reasonings. Specifically, the adaptation model is initialized with a relatively small open-source LLM and adapted over a collection of sub-sampled training pairs. To select the representative pairs of correct and incorrect reasonings, we formulated the dataset construction as an optimization problem that minimizes the statistical divergence between the sampled subset and the entire collection, and solved it via a genetic algorithm. We then train the adaptation model over the sampled pairs by contrasting the likelihoods of correct and incorrect reasonings. Our experimental results demonstrate that CoBB significantly improves reasoning accuracy across various QA benchmarks, compared to the best-performing adaptation baselines.
- Asia > Pakistan > Sindh > Karachi Division > Karachi (0.04)
- North America > United States > Washington > Thurston County > Olympia (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Michigan (0.04)
AI Sustainability in Practice Part One: Foundations for Sustainable AI Projects
Leslie, David, Rincon, Cami, Briggs, Morgan, Perini, Antonella, Jayadeva, Smera, Borda, Ann, Bennett, SJ, Burr, Christopher, Aitken, Mhairi, Katell, Michael, Fischer, Claudia, Wong, Janis, Garcia, Ismael Kherroubi
Sustainable AI projects are continuously responsive to the transformative effects as well as short-, medium-, and long-term impacts on individuals and society that the design, development, and deployment of AI technologies may have. Projects, which centre AI Sustainability, ensure that values-led, collaborative, and anticipatory reflection both guides the assessment of potential social and ethical impacts and steers responsible innovation practices. This workbook is the first part of a pair that provides the concepts and tools needed to put AI Sustainability into practice. It introduces the SUM Values, which help AI project teams to assess the potential societal impacts and ethical permissibility of their projects. It then presents a Stakeholder Engagement Process (SEP), which provides tools to facilitate proportionate engagement of and input from stakeholders with an emphasis on equitable and meaningful participation and positionality awareness.
- Europe > United Kingdom (0.92)
- North America > United States > District of Columbia > Washington (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- (7 more...)
- Research Report (1.00)
- Instructional Material > Course Syllabus & Notes (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Applied AI (1.00)
Active Prompting with Chain-of-Thought for Large Language Models
Diao, Shizhe, Wang, Pengcheng, Lin, Yong, Zhang, Tong
The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty so as to select the most uncertain questions for annotation. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks. Further analyses of different uncertainty metrics, pool sizes, zero-shot learning, and accuracy-uncertainty relationship demonstrate the effectiveness of our method. Our code will be available at https://github.com/shizhediao/active-prompt.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- (8 more...)
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple Tasks
Wang, Zhenhailong, Pan, Xiaoman, Yu, Dian, Yu, Dong, Chen, Jianshu, Ji, Heng
Although large language models have achieved impressive zero-shot ability, the huge model size generally incurs high cost. Recently, semi-parametric language models, which augment a smaller language model with an external retriever, have demonstrated promising language modeling capabilities. However, it remains unclear whether such semi-parametric language models can perform competitively well as their fully-parametric counterparts on zero-shot generalization to downstream tasks. In this work, we introduce $\text{Zemi}$, a zero-shot semi-parametric language model. To our best knowledge, this is the first semi-parametric language model that can demonstrate strong zero-shot performance on a wide range of held-out unseen tasks. We train $\text{Zemi}$ with a novel semi-parametric multitask prompted training paradigm, which shows significant improvement compared with the parametric multitask training as proposed by T0. Specifically, we augment the multitask training and zero-shot evaluation with retrieval from a large-scale task-agnostic unlabeled corpus. In order to incorporate multiple potentially noisy retrieved augmentations, we further propose a novel $\text{augmentation fusion}$ module leveraging perceiver resampler and gated cross-attention. Notably, our proposed $\text{Zemi}_\text{LARGE}$ outperforms T0-3B by 16% on all seven evaluation tasks while being 3.9x smaller in model size.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Washington > Thurston County > Olympia (0.04)
- (2 more...)
- Leisure & Entertainment > Sports > Football (1.00)
- Media > Film (0.93)
- Education (0.68)
- (2 more...)
New methods for metastimuli: architecture, embeddings, and neural network optimization
Picone, Rico A. R., Webb, Dane, Obierefu, Finbarr, Lentz, Jotham
Six significant new methodological developments of the previously-presented "metastimuli architecture" for human learning through machine learning of spatially correlated structural position within a user's personal information management system (PIMS), providing the basis for haptic metastimuli, are presented. These include architectural innovation, recurrent (RNN) artificial neural network (ANN) application, a variety of atom embedding techniques (including a novel technique we call "nabla" embedding inspired by linguistics), ANN hyper-parameter (one that affects the network but is not trained, e.g. the learning rate) optimization, and meta-parameter (one that determines the system performance but is not trained and not a hyper-parameter, e.g. the atom embedding technique) optimization for exploring the large design space. A technique for using the system for automatic atom categorization in a user's PIMS is outlined. ANN training and hyper- and meta-parameter optimization results are presented and discussed in service of methodological recommendations.
- North America > United States > Washington > Thurston County > Olympia (0.04)
- North America > United States > Washington > Thurston County > Lacey (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (6 more...)
Listen to AI Tries to Save the Whales
"We head to the Pacific northwest to understand the obstacles that confront these endangered orcas and how researchers are using artificial intelligence to help orcas and humans to coexist. WHAT HAPPENED TO J thirty five or Tala wasn't an anomaly the southern resident cavs have been struggling to survive for some time they've been listed as endangered in both the US and Canada since the mid arts. But their numbers continue to fall in two, thousand five there were eight. Now there are just seventy two in the wild one lives in captivity. Their home waters in the sailor, see an elaborate network of channels that span the coasts of Seattle Vancouver from Olympia Washington in the south to the middle of Vancouver Island British Columbia in the north. The see encompasses puget sound the Strait of Georgia and the Strait of Juan De. Much of it is rich in natural beauty and teeming with wildlife with rural shorelines backlit by tall evergreens and craggy.
- North America > Canada > British Columbia (0.26)
- Pacific Ocean > North Pacific Ocean > Puget Sound (0.25)
- North America > United States > Washington > Thurston County > Olympia (0.25)
- (3 more...)